11 research outputs found

    Learning Scheduling Algorithms for Data Processing Clusters

    Full text link
    Efficiently scheduling data processing jobs on distributed compute clusters requires complex algorithms. Current systems, however, use simple generalized heuristics and ignore workload characteristics, since developing and tuning a scheduling policy for each workload is infeasible. In this paper, we show that modern machine learning techniques can generate highly-efficient policies automatically. Decima uses reinforcement learning (RL) and neural networks to learn workload-specific scheduling algorithms without any human instruction beyond a high-level objective such as minimizing average job completion time. Off-the-shelf RL techniques, however, cannot handle the complexity and scale of the scheduling problem. To build Decima, we had to develop new representations for jobs' dependency graphs, design scalable RL models, and invent RL training methods for dealing with continuous stochastic job arrivals. Our prototype integration with Spark on a 25-node cluster shows that Decima improves the average job completion time over hand-tuned scheduling heuristics by at least 21%, achieving up to 2x improvement during periods of high cluster load

    Invited review: Large-scale indirect measurements for enteric methane emissions in dairy cattle: A review of proxies and their potential for use in management and breeding decisions

    Get PDF
    Publication history: Accepted - 7 December 2016; Published online - 1 February 2017.Efforts to reduce the carbon footprint of milk production through selection and management of low-emitting cows require accurate and large-scale measurements of methane (CH4) emissions from individual cows. Several techniques have been developed to measure CH4 in a research setting but most are not suitable for large-scale recording on farm. Several groups have explored proxies (i.e., indicators or indirect traits) for CH4; ideally these should be accurate, inexpensive, and amenable to being recorded individually on a large scale. This review (1) systematically describes the biological basis of current potential CH4 proxies for dairy cattle; (2) assesses the accuracy and predictive power of single proxies and determines the added value of combining proxies; (3) provides a critical evaluation of the relative merit of the main proxies in terms of their simplicity, cost, accuracy, invasiveness, and throughput; and (4) discusses their suitability as selection traits. The proxies range from simple and low-cost measurements such as body weight and high-throughput milk mid-infrared spectroscopy (MIR) to more challenging measures such as rumen morphology, rumen metabolites, or microbiome profiling. Proxies based on rumen samples are generally poor to moderately accurate predictors of CH4, and are costly and difficult to measure routinely onfarm. Proxies related to body weight or milk yield and composition, on the other hand, are relatively simple, inexpensive, and high throughput, and are easier to implement in practice. In particular, milk MIR, along with covariates such as lactation stage, are a promising option for prediction of CH4 emission in dairy cows. No single proxy was found to accurately predict CH4, and combinations of 2 or more proxies are likely to be a better solution. Combining proxies can increase the accuracy of predictions by 15 to 35%, mainly because different proxies describe independent sources of variation in CH4 and one proxy can correct for shortcomings in the other(s). The most important applications of CH4 proxies are in dairy cattle management and breeding for lower environmental impact. When breeding for traits of lower environmental impact, single or multiple proxies can be used as indirect criteria for the breeding objective, but care should be taken to avoid unfavorable correlated responses. Finally, although combinations of proxies appear to provide the most accurate estimates of CH4, the greatest limitation today is the lack of robustness in their general applicability. Future efforts should therefore be directed toward developing combinations of proxies that are robust and applicable across diverse production systems and environments.Technical and financial support from the COST Action FA1302 of the European Union

    A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities

    No full text
    corecore